canon: add telemetry-validation-gate constraint#210
Merged
Conversation
Adds tier-1 canon defining the single smoke-and-verify gate for the telemetry Emission Contract: enumerate registered tools, drive one synthetic call per tool per surface, compare emitted bytes/tokens against locally-computed expectations. No time bound. No statistical threshold. Sample size of 1 per tool per surface is sufficient because the wrapper is deterministic. Supersedes the implicit '24-hour soak' framing in odd/handoffs/2026-05-14-telemetry-coverage-completeness, which assumed organic load oddkit does not actually receive. Notes that release-validation-gate Rule 2 is arguably not triggered by wrapper-only changes (no response-envelope change, no tool add/remove). If a future wrapper change touches load-bearing surface in the Rule 2 sense, both gates apply. derives_from telemetry-governance, release-validation-gate, performed-prudence-anti-pattern.
Canon Quality — Frontmatter Schema ✅All 41 file(s) in Validator: |
Canon Quality —
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Gate procedure computes expected values from wrong inputs
- Updated steps 2 and 3 of the gate procedure to record and measure the in-memory
argsobject and{ content: [...] }envelope (matching the wrapper's actual emission inputs per telemetry-governance Rule 2) instead of the full HTTP request/response bodies.
- Updated steps 2 and 3 of the gate procedure to record and measure the in-memory
Preview (e46ca204fd)
diff --git a/canon/constraints/telemetry-validation-gate.md b/canon/constraints/telemetry-validation-gate.md
new file mode 100644
--- /dev/null
+++ b/canon/constraints/telemetry-validation-gate.md
@@ -1,0 +1,114 @@
+---
+uri: klappy://canon/constraints/telemetry-validation-gate
+title: "Telemetry Validation Gate — Smoke Every Tool, Verify Every Number"
+audience: canon
+exposure: nav
+tier: 1
+voice: neutral
+stability: evolving
+tags: ["canon", "constraint", "telemetry", "validation", "smoke-test", "wrapper-correctness", "release-pipeline", "analytics-engine"]
+epoch: E0008
+date: 2026-05-15
+derives_from: "canon/constraints/telemetry-governance.md, canon/constraints/release-validation-gate.md, canon/observations/performed-prudence-anti-pattern.md"
+complements: "canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern.md, canon/observations/2026-05-14-telemetry-coverage-gap-quantified.md"
+governs: "Every release that touches the telemetry Emission Contract surface in oddkit and TruthKit"
+status: active
+---
+
+# Telemetry Validation Gate — Smoke Every Tool, Verify Every Number
+
+> The Emission Contract requires every registered tool to emit accurate metered usage on every call. Verifying it is one smoke pass per surface: hit every tool, compare the emitted `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` against the request and response that were actually sent. If the numbers match expectations within tokenizer noise (3–4% for `cl100k_base`), the wrapper is working. There is no soak period, no organic-load threshold, no statistical sample bar. Synthetic traffic is the only traffic; the wrapper is deterministic; one call per tool is sufficient.
+
+---
+
+## Summary — Stop Pretending Sample Size Buys Confidence
+
+oddkit's hosted service does not see enterprise-scale organic traffic. Real consumers number in the low single digits at any given moment, and most of those are the maintainer themselves. A validation model built around "wait for 24 hours of organic load and check per-tool coverage at 95%" is performed prudence — it inflates statistical ceremony around a question that does not need statistics to answer.
+
+The actual question is: does the per-tool wrapper emit the correct metered values when a known payload passes through it? That question is deterministic. The wrapper is code. Either it reads the JSON-stringified args and envelope, runs `cl100k_base` over them, and writes the result to Analytics Engine — or it doesn't. One call with a known input and known output answers the question completely.
+
+The gate is therefore: drive a synthetic smoke pass across every registered tool on every active deployment surface (main preview and prod after promotion). For each call, compare the emitted numeric fields against what the smoke driver actually sent and received. Tokenizer noise of 3–4% for English-prose payloads is the only legitimate variance; anything else is a bug.
+
+Sample size is one per tool per surface. Increase it for operator margin if desired, but the canon bar is one. There is no time bound. There is no organic-load requirement. If the smoke pass shows accurate numbers across every tool, the wrapper is verified.
+
+---
+
+## The Gate
+
+**When:** After any PR touching `withTelemetry`, tool registration, or the emission envelope is deployed to a surface — main preview after merge to `main`, or prod after the `main → prod` promotion. Run the gate against each surface the change reaches, before declaring that surface verified.
+
+**Question it answers:** Does the wrapper emit accurate `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` for every registered tool?
+
+**Procedure:**
+
+1. Enumerate every `server.tool()` registration in `workers/src/index.ts`. This is the smoke target list.
+2. Drive one synthetic call per tool through the surface's `/mcp` endpoint. Record the exact `args` object sent (the JSON-RPC `params.arguments` payload) and the exact `{ content: [...] }` envelope returned by the handler — not the full HTTP request/response bodies, which include JSON-RPC framing the wrapper does not see.
+3. For each call, compute the expected values locally against the same in-memory values the wrapper measures per `klappy://canon/constraints/telemetry-governance` Rule 2: `bytes_in = utf8_byte_length(JSON.stringify(args))`, `bytes_out = utf8_byte_length(JSON.stringify(content_envelope))`, `tokens_in = cl100k_count(JSON.stringify(args))`, `tokens_out = cl100k_count(JSON.stringify(content_envelope))`. For SSE-streamed responses, expected `bytes_out = 0` and `tokens_out = 0` per the Emission Contract.
+4. Query `oddkit_telemetry` with `event_type = 'tool_call'`, `worker_version = <surface-version>`, and a timestamp window covering the smoke run.
+5. Match each emitted row to the corresponding smoke call (by tool name and timing). Compare emitted versus expected on all four fields.
+
+**Pass:** Every registered tool appears in the telemetry dataset, and every emitted numeric field is within tokenizer noise (±5%) of the expected value computed locally.
+
+**Fail (missing tool):** Any registered tool is absent from the dataset after smoke. The wrapper is not attached to that registration. Block downstream work on this surface; fix forward.
+
+**Fail (wrong number):** Any emitted field is off by more than the noise floor. The wrapper is attached but emission is inaccurate. Investigate; fix; re-smoke.
+
+**Sample threshold:** One call per tool per surface is sufficient. The wrapper is deterministic; a second call with the same input emits the same output. Higher sample counts are operator discretion for cutover margin, not canon requirement.
+
+---
+
+## Why No Time Bound
+
+oddkit's hosted service receives sparse, mostly maintainer-driven traffic. "Wait 24 hours and check organic coverage" is a pattern borrowed from systems where organic traffic actually fills the sample space. Here it does not. A 24-hour window after promotion produces a dataset dominated by maintainer test calls and a handful of synthetic probes — the same data the smoke pass produces immediately, just delayed.
+
+Time bounds are appropriate for systems where the question is whether the wrapper behaves correctly under unforeseen load patterns the operator cannot manufacture — a real concern for services running thousands of QPS across heterogeneous clients. oddkit answers a smaller question: do the numbers come out right for the payloads we send? That is fully answered by deliberate exercise.
+
+Removing the time bound also removes a class of failure mode: orchestrators waiting passively for a soak window to mature, mistaking elapsed time for validation work. The smoke pass is active verification with a definite endpoint.
+
+---
+
+## Why Synthetic Is Enough
+
+The Emission Contract specifies in-memory measurement after Zod validation and before MCP transport framing. The wrapper does not care whether the call originated from a manufactured smoke probe or a real consumer; it sees the same `args` object and the same `{ content: [...] }` envelope. Synthetic and organic traffic produce identical telemetry rows when the payload sizes match.
+
+Synthetic traffic has an additional advantage that organic does not: the smoke driver knows the exact request and response bytes locally. Organic traffic only produces emitted values in the dataset; the ground truth is not directly observable. Verification against organic load is necessarily a sanity check against expected ranges, not against known values. The smoke pass is the stricter test.
+
+---
+
+## Cross-Surface Coverage
+
+The wrapper deploys to whichever surface receives the code. Currently that is two surfaces:
+
+- **Main preview** at `https://main-oddkit.klappy.workers.dev/mcp` — auto-deployed by Cloudflare on every merge to `main` in `klappy/oddkit`.
+- **Production** at `https://oddkit.klappy.dev/mcp` — deployed when the `main → prod` promotion PR merges.
+
+Each surface must be smoke-verified independently. Verifying main preview does not verify prod; the surfaces run independent worker versions and could in principle diverge.
+
+When the program adds TruthKit or any other oddkit-pattern MCP server, the same gate applies to each of those surfaces.
+
+---
+
+## Relationship to release-validation-gate Rule 2
+
+`klappy://canon/constraints/release-validation-gate` Rule 2 requires fresh-context validator dispatch on promotion PRs that touch load-bearing surface. "Load-bearing surface" is defined there by response-envelope changes, new or removed tool registrations, governance file reads, matcher algorithm changes, and `workers/src/orchestrate.ts` modifications. The telemetry wrapper does not change any of these — callers observe identical responses; no tools are added or removed; no governance reads change.
+
+A wrapper change is therefore arguably outside Rule 2's trigger. The orchestrator may smoke-verify directly per this gate without dispatching a fresh-context validator, provided the smoke pass shows accurate numbers across every tool on every surface.
+
+If a future wrapper change *does* touch load-bearing surface (for example, exposing new envelope fields to callers), Rule 2 fires in addition to this gate, and both must be satisfied.
+
+---
+
+## Receipts
+
+- `klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified` — the diagnostic that motivated the Emission Contract and exposed how prior time-bound validation hid the actual coverage problem.
+- `klappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern` — decision record for the wrapper architecture this gate verifies.
+- `klappy://canon/observations/performed-prudence-anti-pattern` — the failure mode this gate is structured to avoid (statistical ceremony around a deterministic question).
+- `klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness` — original handoff whose "24-hour soak" framing this canon supersedes.
+
+---
+
+## See Also
+
+- `klappy://canon/constraints/telemetry-governance` — the Emission Contract this gate verifies.
+- `klappy://canon/constraints/release-validation-gate` — separate constraint covering promotion-PR fresh-context review.
+- `klappy://canon/constraints/measure-before-you-object` — the methodology that argues against theoretical objections to empirical answers; applies here against statistical-threshold arguments to deterministic questions.You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 3f52f07. Configure here.
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Adds tier-1 canon defining the single gate for verifying the telemetry Emission Contract.
Why this exists
The handoff at
klappy://odd/handoffs/2026-05-14-telemetry-coverage-completenessran a "24-hour soak validator" framing on the post-PR-#157 cutover. In session it became clear that framing is incoherent for oddkit specifically: there is no organic load to soak against, and "wait for organic ≥95% coverage on every tool" is unmeetable against manufactured smoke traffic.The actual question — does the wrapper emit the numbers we expect for the payloads we send? — is deterministic and answerable in a single smoke pass per surface.
What the gate is
server.tool()registration.bytes_in,bytes_out,tokens_in,tokens_out(cl100k_base).No time bound. No sample threshold beyond 1/tool/surface. No statistical ceremony.
Relationship to release-validation-gate
Rule 2 there triggers on response-envelope changes, tool add/remove, governance-read changes, and orchestrate.ts edits. Wrapper-only changes touch none of these — callers see identical responses. This PR notes the orchestrator may smoke-verify directly per this new gate when Rule 2 is not triggered. If a future wrapper change does touch load-bearing surface in the Rule 2 sense, both gates apply.
Gauntlet (Writing Canon checklist)
derives_fromReceipts
klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified— diagnosticklappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern— decision recordklappy://canon/observations/performed-prudence-anti-pattern— the failure mode this gate avoidsklappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness— superseded soak framingNote
Low Risk
Low risk: adds a new tier-1 canon constraint document only, with no code or runtime behavior changes.
Overview
Adds a new tier-1 canon constraint,
telemetry-validation-gate, defining the required release gate for validating the telemetry Emission Contract.The gate replaces time-bound/organic-traffic “soak” expectations with a single synthetic smoke call per registered tool per deployment surface, and requires comparing emitted
bytes_in/bytes_out/tokens_in/tokens_outagainst locally computed ground truth (with an explicit noise tolerance and SSE streaming exception). It also clarifies how this gate interacts withrelease-validation-gateRule 2 (wrapper-only changes can be verified via smoke without triggering fresh-context validation unless load-bearing surface changes).Reviewed by Cursor Bugbot for commit e46ca20. Bugbot is set up for automated code reviews on this repo. Configure here.